This directory contains documentation of the development of the
routines which implement the nonlocal pseudopotential operations in
response-function (RF) strain calculations. These were written by D.
R. Hamann in 2002-2003, and the were first included in the 4.2 release.
While conceptually straightforward, the expressions to be evaluated are
highly complex, and the only practical way to develop the code was to
use a symbolic-manipulation program and automated post-processing.
These tools are collected here, and their structure and usage are
explained as an aid to future developers who embark on related
projects.
*********************** INTRODUCTION *********************************
The underlying theoretical approach to treating strain within
density-functional perturbation theory is presented in strainpert.pdf,
which is a preprint that has been submitted to Phys. Rev. B. It is
also posted on arXiv.org as cond-mat/0409269. Before proceeding
further, it would be well to read this at least through Sec. III-E.
The equation numbers below refer to this preprint.
The application of the nonlocal psp's in ground-state (GS)
calculations to find wavefunctions, energies, forces, and stresses was
developed by D. C. Allan. He introduced the tensor formulation, Eq.
(51), and routines implementing Eq. (52) for nonlocal wave-function
operations, and the ground-state expectation values of Eq. (54) for
stresses. His approach was extended to RF atomic-displacement
calculations by Allan and X. Gonze. The necessary expressions were
derived and coded by hand in routines metcon, cont13, cont22, cont24,
etc. all in directory src/03nonlocal. These are all called by nonlop,
which is used in both GS and RF calculations. The new RF strain
routines were designed to parallel these existing routines as closely
as possible in their operations, arguments, and structures within which
they are called. The new "machine-coded" routines are the following:
To evaluate the action of the first strain derivatives of the psp's on
wave functions, Eq. (54):
metstr.F90
To evaluate the zero-order wave function expectation values of strain-
strain second derivatives, Eq. (55):
contstr21.F90
contstr22.F90
contstr23.F90
contstr24.F90
contstr25a.F90
contstr25.F90
contstr26.F90
To evaluate the zero-order wave function expectation values of mixed
strain -- atomic-displacement second derivatives ("internal strain"),
Eq. (56):
contistr01.F90
contistr03.F90
contistr10.F90
contistr12.F90
contistr21.F90
contistr30.F90
As discussed in Sec. III-E, the derivative nonlocal operators
couple "right (|K>)" and "left ( 6). Where there
are decimals, "d0" is appended to make sure we don't get single-
precision. (Note that integers, of which there are lots, will not
interfere with double-precision evaluation of these expressions.) The
one place we don't want integers is as divisors, so we restore double
precision (eg. /48. -> /48 -> /48.d0).
Now we have cleaned-up, acceptable Fortran statements, but have
many short lines which do not have proper free-form F90 continuation.
The sed output is saved to a temporary file and further processed by
format_code.x, the executable compiled from format_code.c, to fix the
continuation and "compactify" the Fortran. format_code.c simply copies
its standard input to its standard output one character at a time, but
deletes input newline characters. It counts and tests characters, and
inserts newlines, continuation &'s, and spaces to produce reasonably
uniform line lengths and follow Abinit indentation rules. It breaks
lines after )'s or the last of several adjacent )'s in an attempt to
make the code at least somewhat readable, not that anyone would really
want to. It also makes use of the "#" characters inserted in the M
programs to flag the start of each new "cm(i,j)=" statement.
The post-processed result of the excerpt of d2term1.out is shown
in d2term1.fort, and starts at line 97 in contstr21. This and the
other full routines start by computing the needed metric tensor
derivatives, evaluate all the necessary cm terms, save them for re-use
where possible, and do the actual operations on the aa and bb array
arguments at the very end.
The post-processing produced nearly error-free code. There was a
very small amount of hand-editing, no more than a few lines in each
routine at most to fix a few things that caused compiler errors. While
the c program inserted a warning comment (for possible hand editing) in
places when the number of continuation lines exceeded the Fortran 90
Standard maximum, none of the compilers that have been used since this
has been part of Abinit seemed to mind, so these comments were deleted.
********************** INTERFACE WITH NONLOP *************************
nonlop is a long and complicated routine, so a broad-brush
outline of what it does is in order. Basically, it has two halves. If
the input variable signs==1, only the first half is executed and
expectation values of the nonlocal psp's are output. If signs=2, both
halves are executed; the first to evaluate the "right" side of the
separable nonlocal potential (or its derivatives) operating on the
input wave function, and the second half to deliver plane wave matrix
elements of the "left" side based on coefficients evaluated in the
first half.
The computation of (form factors X tensors), (FFXT), acting on the
input wave function is done in the opernl* routines. (These routines
all do the same thing, but are coded differently for better performance
with different computer architectures.) The same routines also have an
"output" mode if needed in the second half of nonlop, but the discussion
below will focus on the "input" mode.
The opernl3 routine has a loop over wave-vector blocks, and calls
the routine mkffkg3 inside this loop. mkffkg3 internally loops over the
wave vectors in the current block with index ipw. The required
products of tensors, form factors, and form factor derivatives vary
widely depending on what is being done (the "choice" input variable),
and the number of projectors for each angular momentum for the current
type of atom. The first thing mkffkg3 does is to compute the required
tensor products of components each wave vector in the block, the
kpgx(ipw,ii) array. This corresponds exactly to the tensor definitions
in make_met2str.m (regarding ii as a one-dimensional index of the tnk
List). The remainder of the program packs the leading dimension of the
output array ffkg(iffkg,ipw) with FFXT's in an order determined by a
number of if's and do loops. There is no simple mapping of the
implicit composite of indices into iffkg. However, referring to the
tnk definition at the start of make_met2str.m, ffkg has sequential
groups of elements corresponding to the full range of j in each (*i,j*)
set.
opernl3 forms dot products on the plane wave index of ffkg and
the wave function (including a phase factor), running over the entire
index range iffkg=1,nffkg without regard to the composite nature of the
iffkg index. This is the most time-consuming part, and justifies the
arcane packing of ffkg. The remainder of the routine PARTIALLY UNPACKS
the dot-product(iffkg) array into several output arrays, mimicking the
if and do structure of mkffkg to recreate the composite identity of the
iffkg index. The reader is directed to the OUTPUT description in
opernl3 for details. The lead index of these arrays generally
corresponds to a segment of 1,nffkg.
Now we come to the rationale for the detailed discussion of the
last few paragraphs. Segments of the output arrays of operln3 are the
aa, bb arguments of our new routines for strain-perturbation RF
calculations. It was necessary to expand mkffkg3 to include the new
FFXT's needed for these calculations, and to expand the set of output
arrays and the unpacking part of opernl3 accordingly. The calling
statements for the strain routines must comply with the iffkg structure
(in its partially unpacked form) to pass the proper array segments as
aa, bb arguments.
The other significant addition to implement the nonlocal strain
operations was the need to extend mkffnl, the routine which computes
the Eq. (48) form factors and their derivatives, to include the second
derivatives. This is called by a number of routines prior to calling
nonlop.
We have discussed the "input" actions of opernl3, which produce
expectation values. The "output" mode producing wave functions utilizes
only one of the new routines, metstr, and further extensions of opernl3
(in the sign==-1, choice==3 section) to do the required "partial
repacking" into the full iffkg index range.
The alternative routines opernl2 (calling mkffkg) and opernl4a,
opernl4b function in the same manner as opernl3 and mkffkg3, with
differences such as index ordering and loop unrolling.